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ABSTRACT 

Despite considerable interest in the application of land surface data assimilation systems (LDASs) for 
agricultural drought applications, relatively little is known about the large-scale performance of such systems 
and, thus, the optimal methodological approach for implementing them. To address this need, this paper 
evaluates an LDAS for agricultural drought monitoring by benchmarking individual components of the 
system (i.e., a satellite soil moisture retrieval algorithm, a soil water balance model, and a sequential data 
assimilation filter) against a series of linear models that perform the same function (i.e., have the same basic 
input/output structure) as the full system component. Benchmarking is based on the calculation of the lagged 
rank cross correlation between the normalized difference vegetation index (NDVI) and soil moisture esti- 
mates acquired for various components of the system. Lagged soil moisture/NDVI correlations obtained 
using individual LDAS components versus their linear analogs reveal the degree to which nonlinearities 
and/or complexities contained within each component actually contribute to the performance of the LDAS 
system as a whole. Here, a particular system based on surface soil moisture retrievals from the Land Parameter 
Retrieval Model (LPRM), a two-layer Palmer soil water balance model, and an ensemble Kalman filter (EnKF) 
is benchmarked. Results suggest significant room for improvement in each component of the system. 


1. Introduction 

In water limited ecosystems, soil water content in the 
root zone is a strong predictor of future vegetation 
condition (Adegoke and Carleton 2002; Wang et al. 
2007). Therefore, characterization of root-zone soil 
moisture plays a critical role for crop yield forecasting 
and agricultural drought monitoring systems (Bolten 
et al. 2010). The availability of satellite-based remote 
sensing data has accelerated the development of drought 
early warning systems by providing continuous information 
in space and time (Hayes et al. 2012). Nonetheless, the 
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shallow sensing depth (top few centimeters) and un- 
certain accuracy of currently available satellite soil 
moisture retrievals has necessitated the integration of 
hydrologic models and surface soil moisture observa- 
tions from satellites through data assimilation tech- 
niques to obtain more accurate root-zone soil moisture 
estimates. Previous studies have demonstrated that the 
assimilation of surface soil moisture retrievals can im- 
prove the estimation of root-zone soil moisture by a land 
surface model (LSM; Reichle et al. 2002; Crow and 
Wood 2003; Reichle and Koster 2005). Specifically for 
agricultural drought monitoring, Bolten et al. (2010) and 
Bolten and Crow (2012) describe the benefit of assimi- 
lating surface soil moisture retrievals from the Ad- 
vanced Microwave Scanning Radiometer for Earth 
Observing System (EOS; AMSR-E) into the U.S. De- 
partment of Agriculture (USD A) modified Palmer soil 
moisture model. 
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The above-mentioned soil moisture data assimilation 
systems encompass three major components: 1) surface 
soil moisture retrieval from primary observations, 
2) land surface model prediction, and 3) update (anal- 
ysis) of the model-predicted soil moisture using a se- 
quential data assimilation filter. First, remotely sensed 
soil moisture estimates are obtained through appropriate 
retrieval algorithms from primary measurements (e.g., 
brightness temperature from passive microwave sensors 
and backscattering coefficients from active microwave 
sensors). Second, LSMs take meteorological inputs (e.g., 
precipitation and air temperature) and produce soil 
moisture state outputs based on the physics represented 
in the model. Finally, the model-predicted soil moisture is 
constrained by the assimilation of observed soil moisture 
to minimize errors in model state and flux predictions. 

Although a number of previous studies have demon- 
strated the benefits of assimilating satellite-based sur- 
face soil moisture into LSMs for root-zone soil moisture 
prediction, relatively little is known about the relative 
merits of particular retrieval, modeling, and/or data as- 
similation strategies. In particular, it remains unclear 
what level of complexity and/or nonlinearity is appro- 
priate for the system components described above. 
Traditionally, the performance of these components has 
been evaluated by comparing their output with a set of 
independent ground-based data using metrics such as 
root-mean-square error (RMSE) or correlation coefficient. 
However, this typical evaluation approach does not take 
into account the minimum level of expected perfor- 
mance that stems from the inputs into the process. As 
a consequence, it is difficult to objectively determine the 
efficiency of each system component. 

A possible way to overcome this limitation is by ex- 
amining the relative change of a target metric against a 
benchmark set by a competing simpler approach. While 
the typical evaluation scheme computes RMSE against 
observations, for example, the benchmarking approach 
shows a “change” in RMSE versus a competing simpler 
model. In this way, benchmarking allows us to directly 
assess the value of the more complex model. Therefore, 
in order to objectively evaluate the net benefit of any 
nonlinear processes, it is desirable to compare the com- 
plex nonlinear model against a benchmark established 
with a simpler linear competing model (a minimum ref- 
erence level). When this benchmarking approach is ap- 
plied to each component of a data assimilation system, 
the relative weaknesses/strengths of a specific component 
can be diagnosed. 

Recently, the LSM community has paid increasing 
attention to benchmarking approaches for more sys- 
tematic and objective evaluation of model performance 
(Abramowitz 2005; Abramowitz et al. 2008; Abramowitz 


2012; Kumar et al. 2012; Luo et al. 2012). For instance, 
Abramowitz et al. (2008) adopted two statistical models 
as benchmarks against which the performances of dif- 
ferent LSMs were evaluated. The statistical models took 
four meteorological inputs (downward shortwave radi- 
ation, air temperature, specific humidity, and wind speed) 
and were trained to produce the output fluxes (sensible 
heat, latent heat, and net CO 2 exchange) just as the typ- 
ical LSMs do. Since the statistical models do not involve 
any physical mechanisms, they can be used to examine 
how much impact the input variables have on the output 
fluxes and how efficient the nonlinear model physics are 
in augmenting this value (Abramowitz 2005). 

Here we apply a similar benchmarking approach to 
evaluate the performance of each component of a soil 
moisture data assimilation system in current operational 
use for global agricultural drought monitoring. In par- 
ticular, we attempt to assess individual components of 
a drought-monitoring soil moisture data assimilation 
system and benchmark the efficiency of these compo- 
nents relative to simpler, linear retrieval, modeling, and 
data integration strategies. In this way, we hope to im- 
prove our understanding of skill contributed by various 
components of the system and ultimately target specific 
aspects of such systems for improvement. 

The performance of the data assimilation components 
and their benchmarks will be compared using an eval- 
uation metric, the lagged rank correlation between the 
output of each component and the normalized differ- 
ence vegetation index (NDVI), in particular, the 1 -month 
lagged correlation. This evaluation approach was pro- 
posed by Peled et al. (2010) and was applied by Bolten 
and Crow (2012) and Crow et al. (2012b) to quantify the 
skill of different LSMs in predicting variations in vege- 
tative health associated with water stress. The advantage 
of this particular evaluation approach is that it over- 
comes the spatial limitations of traditional evaluation 
approaches for LSMs or retrieval algorithms against in 
situ observations. For instance, satellite-based soil mois- 
ture products are typically evaluated based on comparisons 
with a small number of point-scale in situ measurements 
within a coarse-scale retrieval pixel (e.g., Draper et al. 2009; 
Brocca et al. 2011). These kind of direct comparisons be- 
tween satellite-based and in situ (point) observations are 
necessarily limited in spatial extent (Crow et al. 2012a) and 
have issues such as differences in spatial resolution and 
vertical support (Owe et al. 2001). While using the cor- 
relation with NDVI is somewhat indirect, the benefit of 
this evaluation approach is that it can be applied over 
wide spatial areas and is not limited to the relatively small 
number of sparse sites where high-quality soil moisture 
information can readily be upscaled to match a satellite 
remote sensing footprint (Crow et al. 2012a). 
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Here, remotely sensed surface soil moisture and veg- 
etation optical depth retrievals are obtained from the 
Land Parameter Retrieval Model (LPRM) developed at 
VU University Amsterdam and the National Aero- 
nautics and Space Administration (NASA; Owe et al. 
2008). Likewise, the two-layer Palmer water balance 
model (Bolten et al. 2010) is used as a representative 
LSM. While more modern (and complex) LSMs are 
available, it is worth noting that the two-layer Palmer 
model remains the operational water balance tool ap- 
plied by the USDA Foreign Agricultural Service (FAS) 
for global agricultural drought forecasting. In addition, 
past research suggests that when evaluated by our pro- 
posed NDVI rank correlation metric, increased LSM 
complexity does not necessarily lead to improved root- 
zone soil moisture predictions (Crow et al. 2012b). Fi- 
nally, an ensemble Kalman filter (EnKF) approach is 
applied to assimilate LPRM surface soil moisture re- 
trievals into the Palmer two-layer model. 

The main objectives of this study are 1) to identify 
where strengths or deficiencies exist among each com- 
ponent of an integrated soil moisture data assimilation 
system, 2) to test whether nonlinearity in the model (or 
algorithm) is adding information in the context of 
drought monitoring, and 3) to explore opportunities for 
model improvement by assessing various linear combi- 
nations of available input data. 

2. Data and model 

Our general methodological approach is based on 
reproducing (and then benchmarking) the remote sensing, 
modeling, and data assimilation components of an existing 
operational drought monitoring system. Details on indi- 
vidual components of the system are described below. 

a. LPRM 

In this study, surface soil moisture observations are 
derived from AMSR-E through LPRM. AMSR-E is 
a microwave sensor that operated on board the Aqua 
satellite from 2002 to 2011. Since it was the first satellite 
mission to produce soil moisture as a standard product 
and with a prescribed accuracy goal (Njoku et al. 2003; 
Brocca et al. 2011), its soil moisture products have been 
widely validated against in situ observations over vari- 
ous regions (Wagner et al. 2007; Draper et al. 2009; 
Rudiger et al. 2009; Jackson et al. 2010). Among dif- 
ferent retrieval algorithms, LPRM has been proven to 
perform well, producing relatively high correlations 
with in situ measurements (Wagner et al. 2007; Draper 
et al. 2009; Rudiger et al. 2009). 

The LPRM uses H- and V-polarized brightness tem- 
perature (T bH and T bv ) from either C band (6.9 GHz) 


or X band (10.7 GHz) to retrieve soil moisture (0lprm) 
and vegetation optical depth (r) simultaneously. The 
retrieval methodology consists of an iterative optimi- 
zation of the nonlinear forward radiative transfer model 
to select the h LPRM and r that minimize the difference 
between predicted and measured T bH . The radiative 
transfer model is constrained by parameterizing t as a 
function of the soil dielectric constant, and the micro- 
wave polarization difference index MPDI = (T b jj— T b y)l 
{T bH + T b v ) (Owe et al. 2001; Meesters et al. 2005). The 
nonlinear relationship between soil moisture and soil 
dielectric constant is parameterized based on soil tex- 
ture data according to Wang and Schmugge (1980). 
Furthermore, the surface soil temperature ( T s ), required 
for the radiative transfer equation, is linearly related to 
36.5-GHz V-polarized data. 

C-band data are more sensitive to soil moisture than 
X-band observations, but are more affected by radio 
frequency interferences (RFIs). In this study, C-band 
observations are used for soil moisture retrievals by 
default, except for regions such as the United States 
where C-band RFI is problematic. For a more detailed 
description of the LPRM algorithm, see Owe et al. (2001), 
De Jeu and Owe (2003), and Meesters et al. (2005). 

In this paper, we focus on data from the descending 
(nighttime) orbit that have generally demonstrated 
higher correlation with in situ data (Wagner et al. 2007; 
Draper et al. 2009; Rudiger et al. 2009; Gruhier et al. 
2010). All the input and output data are in gridded 0.25° 
format. The output soil moisture product is expressed as 
normalized volumetric water content (m 2 3 m 3 ) ranging 
from 0.00 to 1.00. However, LPRM output is known to 
be biased high in many locations (e.g., Wagner et al. 
2007), and its optimization loop is therefore purpose- 
fully not capped at expected saturation soil moisture 
values, as this would adversely affect correlation with 
actual soil moisture values. Consequently, its soil mois- 
ture output is more accurately interpreted as a relative 
soil moisture index, rather than an absolute value. Since 
r is a linear function of vegetation water content (De Jeu 
and Owe 2003), it should also be useful for monitoring 
vegetation conditions. Typical values of the r range 
between 0 and 1.3 at C band, and values above 0.75 are 
large enough to decrease the sensitivity of C band to soil 
moisture variation for practical purposes (Owe et al. 2001). 

b. Hydrologic models 

The modified two-layer Palmer model developed 
initially by Palmer (1965) is used to test the efficiency of 
nonlinear hydrologic models. The two-layer Palmer 
model is a relatively simple LSM compared to other 
modern LSMs, but it captures key nonlinear LSM 
characteristics due to finite water holding capacity of the 
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soil layers, water movement between two layers, and 
water loss due to evapotranspiration (ET). Therefore, 
the Palmer model has been used operationally by 
USDA-FAS for agricultural drought monitoring. In 
addition, it is reasonable to take it as a baseline for soil 
moisture assimilation studies considering the marginal 
added skills of other advanced LSMs in predicting veg- 
etation conditions (Crow et al. 2012b). For more ob- 
jective evaluation, the performance of the Palmer 
model is compared with a linear soil water accounting 
model [the antecedent precipitation index (API) model] 
as well as a statistical benchmark model explained in 
section 3b. 

The Palmer model is based on a simple bookkeeping 
method: precipitation replenishes soil water content 
within its layers, and water loss from each layer is de- 
rived by actual ET. The actual ET depends on potential 
ET, the initial water content, and available water ca- 
pacity of the soil layers. The first soil layer of the model 
is assumed to contain 2.54 cm of available water content 
at field capacity. The available water capacity of the 
second layer is determined based on soil texture, depth 
to bedrock, and soil type from the Food and Agriculture 
Organization (FAO) Digital Soil Map of the World 
(Bolten et al. 2010). The second layer is assumed to 
have a “no flow” boundary. The modified Palmer model 
currently operated by the USDA-FAS has an additional 
diffusion term for stronger vertical coupling and gradual 
soil moisture gradients between two layers, which con- 
tributes to better assimilation results (Bolten et al. 2010). 

The potential ET is calculated from the modified 
FAO Penman-Monteith equation using observed daily 
minimum and maximum temperature (Bolten et al. 
2010). Therefore, the major meteorological forcing in- 
puts of the Palmer model are daily precipitation and air 
temperature. Those daily meteorological inputs are de- 
rived from the National Centers for Environmental 
Prediction (NCEP) Global Data Assimilation System 
(GDAS) at 0.25° resolution. The model is run from July 
2002 to June 2011 when AMSR-E observations are 
available. The initial condition is set up after spinning up 
the model three times for the simulation period. Palmer 
model output could possibility be enhanced by adding 
additional processes to the model and/or improved cal- 
ibration of its existing processes. However, at global 
scales, such improvements are hard to implement and 
validate. Instead, our focus here is on evaluating the 
existing Palmer model version currently in use opera- 
tionally at USDA-FAS. 

While the Palmer model serves as a baseline for 
nonlinear hydrologic models, a simpler linear API model 
expressed in Eqs. (1) and (2) is also used for bench- 
marking purposes: 


API y = ^API ; _ 1 . + /’. (1) 

Tv = “-/3(7 max;y -270), (2) 

where Pq is accumulated precipitation on day i and grid / 
and T maxi j is the climatological air temperature (K). 
The global constants a and (i are assigned values of 0.99 
(no unit) and 0.0002 K ', respectively, based on manual 
calibration to maximize the lagged correlation (— 1 
month) of API versus NDVI (explained in section 3a). 

The loss coefficient y of the API model is modified as 
shown in Eq. (2) to reflect varying depletion rates of soil 
water content with daily maximum temperature T max i j. 
This modification allows the API model to run with the 
same meteorological forcing (i.e., daily precipitation 
and maximum air temperature) as the Palmer model. 
Both the Palmer and the API model are prognostic 
models reflecting previous soil moisture conditions 
(memory), while the benchmark statistical model (ex- 
plained in section 3b) is a fully diagnostic model. There- 
fore, additional comparison with the API model enables 
us to evaluate the efficiency of nonlinear model physics in 
the Palmer model more objectively than comparing only 
with the benchmark model. 

c. EnKF 

The EnKF is a well-known sequential data assimila- 
tion technique and has been demonstrated as an ef- 
fective technique for soil moisture assimilation by a 
number of studies (Reichle et al. 2002; Crow and Wood 
2003; Reichle and Koster 2005; Zhou et al. 2006). It is 
based on a statistical Monte Carlo approach in which 
forecast error covariance information is sampled from 
an ensemble of model realizations. In this study, a 
30-member ensemble is initially created by directly 
adding perturbations to a state vector 0 (consisting of 
both surface and root-zone soil moisture values). Each 
ensemble member of the state vector 0^._ 1 is forecasted 
through a nonlinear model operator /*_ i(-) at time 
step k — 1: 


C=4 - M 


i+ 


-V w k-\ 


). 


( 3 ) 


where the plus and minus superscripts refer to the 
updated and forecasted states, respectively, for each 
ensemble member i. The error term w represents all 
uncertainties in the forcing data, model formulation, 
and/or parameterization. It conforms to a Gaussian 
distribution with zero mean and has covariance Q: 


Q a Qp 

aQp crQ \ ’ 


( 4 ) 
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where p indicates the vertical correlation between per- 
turbations applied to each soil layer and is assumed to be 
equal to one in this study. The scalar term a reflects the 
ratio of standard deviations of the root-zone soil mois- 
ture perturbations to the one for surface soil moisture 
perturbations. Here, this ratio is assumed equal to the 
ratio between the available water holding capacity of the 
surface layer and the root-zone layer. We also assume 
the uncertainty in the model forecast is dominated pri- 
marily by the accuracy of the main forcing variable, 
precipitation data. Therefore, Q in Eq. (4) is expressed 
in Eq. (5) as a function of the average distance ( D ) to the 
three closest World Meteorological Organization (WMO) 
rain gauges used to correct satellite-based precipitation 
data: 


' 0.02 2 m 6 m~ 6 
0.04 2 m 6 m~ 6 
0.06 2 m 6 m~ 6 
. 0.08 2 m 6 m~ 6 


if D < 100 km 
if 100 km <D< 150 km 
if 150 km <D< 200 km 
if D > 200 km 


( 5 ) 


The EnKF updates model-predicted soil moisture 
based on relative uncertainties in observations and 
model predictions. Since AMSR-E provides daily soil 
moisture observations, the model predictions are up- 
dated on a daily basis. Before assimilation, the LPRM 
soil moisture retrievals are rescaled through a linear 
transformation to match the temporal mean and stan- 
dard deviation of model-predicted surface soil moisture 
for the 9-yr simulation period in order to put them in the 
same climatology with the model. The predicted state 
variable (0^ _1 ) is then updated by optimally integrating 
observations (0lprm,/c) and model forecasts via 

®/t = ^fc[^LPRM,/t _ +V fc ], (6) 

where the observation operator H = [1 0] and the ob- 
servation error term v' k is a Gaussian noise with mean 
zero and variance R. The Kalman gain (K A .) is deter- 
mined by the cross correlation between the forecasted 
observations and the forecasted state variables (P^Hj) 
and covariance matrix of the forecasted observations 


K* = P*Hj[H*P*H l + R]- 1 . (7) 


Table 1. Assumed observation errors (R 112 ) for different land 
cover. 


Land cover classification 

Std dev of 
obs error (m 3 m -3 ) 

Forests or woodlands 

0.06 

(ENF, EBF, DNF, DBF) 

Wooded grasslands/shrubs (WGS) 

0.05 

or closed bushlands or 
shrublands (CBS) 

Open shrublands (OS) 

0.04 

or grasses (GRS) 

Croplands (CRP) 

0.03 

Bare (BAR) 

0.02 

Water 

0.99 


practice by assigning time constant values of R as a func- 
tion of land cover type. Land cover information is ob- 
tained from the 8-km Moderate Resolution Imaging 
Spectroradiometer (MODIS) land cover classification 
dataset produced by the University of Maryland (http:// 
glcf.umd.edu/). Standard deviations of observation er- 
rors, R m , for each different land cover type are shown in 
Table 1. It should be noted that more complex ap- 
proaches have been proposed to define Q and R in data 
assimilation systems (Crow and Van den Berg 2010; 
Dorigo et al. 2010; Parinussa et al. 2011). However, the 
impact of these new approaches on large-scale data as- 
similation results has not yet been verified (Draper et al. 
2013), nor have they been widely applied yet in opera- 
tional systems. As a result, we choose to apply the simpler 
approaches described above as a baseline for our bench- 
marking procedure. 

From a benchmarking point of view, the EnKF takes 
inputs from the outputs of the hydrologic model (surface 
and root-zone soil moisture) and the observation algorithm 
(surface soil moisture) and produces new (updated) root- 
zone soil moisture estimates (Fig. 1). Because the weight- 
ing underlying these new estimates (supposedly) reflects 
our understanding of time/space variations in model and 
observation errors, EnKF estimates should outperform 
competing approaches based on arbitrary averaging. 
Therefore, the efficiency of the EnKF can be evaluated by 
quantifying the added skill of the updated soil moisture 
compared to a benchmarking model consisting of a simple 
(spatially fixed) linear combination of background model 
predictions and surface soil moisture observations. 

d. Evaluation data 


In this study, the uncertainties of the surface soil 
moisture retrievals from LPRM are specified based on 
the land cover type because of the critical effect of veg- 
etation density on above-canopy brightness temperature 
measurements (Owe et al. 2001). Here, we follow typical 


The performance of each process is evaluated using 
the lagged rank correlation with vegetation condition 
reflected in NDVI. Monthly NDVI data are derived 
from MODIS MOD13C2 products and are aggregated 
to 0.25° resolution from its initial 0.05° resolution. 
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(Palmer, API) 
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Higher 
correlation 
with NDVI 


Data Assimilation 
(EnKF) 


Case 4 


1 


L 


/ OkF, / / flcF.T / 


Fig. 1. Schematic of benchmarking framework (0 lprm: surface soil moisture observation, t: 
vegetation optical depth, 0 S : model predicted surface soil moisture, model predicted root-zone 
soil moisture, Ok?/. updated surface soil moisture, Bkfjz- updated root-zone soil moisture). 


MODIS products flagged as “fully reliable” are ex- 
tracted, and pixels in which more than 50% of the area 
consists of barren, tundra, forest cover, and open water 
surface are masked out to focus our study on areas with 
nonnegligible agricultural and grazing land uses. 

3. Experiment setup 

This experiment aims to benchmark individual com- 
ponents of a soil moisture data assimilation system for 
quasi-global (60°S-60°N) agricultural drought monitoring. 
Analysis is based on monthly data from July 2002 to June 
2011. All daily observed or modeled data are averaged to 
monthly values before the analysis. However, months that 
have less than five daily data points are masked. 

Unlike meteorological inputs for hydrologic models 
(precipitation and air temperature), remotely sensed 
brightness temperature and its processed products from 
LPRM have varying coverage with time because of 
several limiting factors such as dense vegetation, frozen 
surface condition, and active precipitation. For consis- 
tency, only pixels that contain retrievals for all five re- 
mote sensing products {T bv , T b H , T s , $lprm- and T ) are 
included in the analysis. Additionally, barren areas 
where no temporal NDVI variability is observed and 
(presumably energy limited) areas where the two-layer 
Palmer model demonstrated a nonsignificant rank cor- 
relation with NDVI anomalies (at 80% significance) are 
masked out in the present analysis (Crow et al. 2012b). 
Note that one potential reason for this loss in sensitivity 
is the saturation of NDVI in densely-vegetated regions. 
While the strict filtering of the data reduces the number 
of available data for the analysis, especially in the North- 
ern Hemisphere, it strengthens the evaluation technique 


(rank cross correlation with NDVI) by focusing on areas 
and seasons in which soil moisture and NDVI are viable 
drought indicators. 

a. Evaluation metric 

As an agricultural drought indicator, root-zone soil 
moisture estimates can be evaluated by measuring how 
much their anomalies are correlated with subsequent 
NDVI anomalies, an indicator of vegetation condition 
(Peled et al. 2010; Bolten and Crow 2012). Likewise, we use 
this criteria (rank correlation with NDVI) to determine the 
agricultural drought forecasting capability of a certain da- 
taset (i.e., a soil moisture proxy predicted by a physical 
model or benchmarks). It is natural to expect that some 
rank correlation of a particular soil moisture dataset will be 
enhanced as the primary observations or forcing variables 
(e.g., precipitation) go through the nonlinear processes se- 
quentially in the retrieval algorithm, model physics, and/or 
data assimilation step (Fig. 1). Magnitudes of the correla- 
tions allow us to compare the performance of each complex 
process relative to their simpler linear benchmarks. 

The correlation analysis of this study is based on ranks 
because the rank time series are free from seasonality, only 
showing relative wetness of a particular month relative to 
the same month of the year in all other years. The analysis 
starts from ranking monthly anomalies of a certain dataset 
of soil moisture proxy grouped by month of year for the 
analysis period (July 2002 to June 2011). The resulting 
ranks — or Rank(0/ f ) of the dataset for month k — are nor- 
malized so that they are in the ranges between 0.0 and 1.0. 
Therefore, a month that has a rank of zero (one) has the 
driest (wettest) soil moisture condition compared to the 
same month of the year in other years. Monthly NDVI data 
are also ranked in the same way. Figure 2 shows an example 





June 2014 


HAN ET AL. 


1123 


ts =? 

|l 

^ ro 

1 E 


1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 



Jul-02 Jul-03 Jul-04 Jul-05 Jul-06 Jul-07 Jul-08 Jul-09 Jul-10 

Fig. 2. Example time series of monthly ranks of LPRM soil moisture estimates and NDVI from 
a 0.25° grid centered at 30.125°N, 109.375°W. 


time series of Rank(fh-) and Rank(NDVb) from a grid in 
North America. 

The lag-L rank correlation coefficient R(L) is cal- 
culated as the Pearson’s correlation between Rank 
(NDVI*) and Rank(0/ f+/ ). In testing the skills of a soil 
moisture dataset as a leading indicator to NDVI, rank 
correlation at L = — 1 (i.e., soil moisture precedes NDVI 
by 1 month) R(— 1) is our primary focus. It should be 
noted that optimal lags to reflect the best NDVI soil 
moisture correlation may be different for different land 
cover types (Musyimi 2011). However, a lag time of 
1 month is consistent with the monthly operational cycle 
of many agricultural drought monitoring activities. In 
addition to the spatial masking described above, months 
having maximum air temperature below 5°C are ex- 
cluded to minimize the impact of cold-season condi- 
tions. That is, snow dominated regions or seasons are not 
the focus of the present study. Finally, for each pixel, the 
correlation is calculated only if there are at least 30 pairs 
of RanktNDY'b) and Rank(0/ f _/ ) to obtain statistically 
significant results. 

To test significance of the sample correlation co- 
efficient, the sample variance (cr 2 ) of a rank correlation 
R(L) for a single pixel is estimated through a Fisher 
transformation (Von Storch and Zwiers 2002) as 
follows: 

F[R(L)] = 1 ln{ [1 + R(L)]/[1 - R(L)]} . ( 8 ) 

This transformation converts the sample correlation 
into a normal distribution with variance cr F = 1 l(N t — 3) 
(Fieller et al. 1957). The number of the temporal degrees 
of freedom (A)) in the time series is computed by con- 
sidering lag-1 temporal autocorrelation of soil moisture 
estimates (p, g ) and NDVI ( p t NDVI ) (Dawdy and Matalas 
1964): 


n t x [(1 Pt,ePt, ndvi)^ + Pre^f.NDVi)] » (9) 

where n, is number of monthly data in the time series. 

The sample variance of a rank correlation R{L) in 
Fisher space (cr|) can be converted into a regular space 
as follows: 

CT 2 = 4(sech 2 {F[R(L)]}) 2 . (10) 

Likewise, the sample variance of spatially averaged 
R(L) in space can be estimated by dividing the spatial 
average of Eq. (10) for each pixel by the number of ef- 
fective spatial degrees of freedom ( N s ) that are averaged 
across 

V s =n s X[(l- Ps )/(l + Pi )] 2 , (11) 

where n s is the total number of pixels averaged over and 
p s is the lag-1 spatial autocorrelation of the field. These 
sample variance values are used to construct error bars 
(±1 standard deviation) in Figs. 3, 4, and 5. The statis- 
tical significance of an R{— 1) difference for a single pixel 
between soil moisture outputs from each data assimila- 
tion component and their benchmarks can be expressed 
via the Z score: 

Z = (F[R 0 (-1)] - F[/? BM (-l)]}/^ 0 + t7 2 3M . (12 ) 


b. Benchmarking models 

Our benchmarking approach is based on a linear mul- 
tivariate regression models. For each component of the 
data assimilation system (i.e., observation, modeling and 
assimilation), an empirical relationship between inputs 
and outputs is established. A general expression of the 
benchmarking model with p independent variables is 
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^LPRM * T * S PM * S AP1 &PM-KF ^API-KF 


Fig. 3. Spatial average of R(L) for the LPRM retrievals (0lprm and t), model predictions (0pm and 0api), and 
updated model outputs (0pm-kf and ^api-kf) within the training (60°S-30°N) and testing (30°-60°N) areas. 


BM(X 1? ... , X [ ) ij - a x X' Xi j + a 2 X' 2ij + ■ ■ ■ + a p X' pij + , 

(13) 


where X' p L - is the anomaly of each input variable p for 
month i and pixel j normalized by its temporal standard 
deviation [(X p ij — X p f)/a p j]. The parameters (a \, ... , a p ) 
are constant in time and space and imply the relative 
contribution (weight) of each independent input 
variable in predicting the integrated soil moisture 
proxy BM(Xi, ... , X p ) tJ . The time series of the resulting 
soil moisture proxies are ranked and used to compute 
R(L ) with NDVI. 

In setting up the relationship, the objective function is 
the spatial average of R(— 1) or correlation between the 
statistical model predictions and NDVI at L = —1. That 
is, our goal in training the parameters is to achieve the 
highest spatial average of the lagged rank correlation 
with next month’s NDVI. We reserve the extratropical 
Northern Hemisphere (ETNH, 30°-60°N) to test the 
empirical models and use the remaining area of the 
globe (60°S-30°N) to train the parameters. This spatial 
segregation allows us to have exclusively independent 
datasets for training and testing purposes. Note that our 
goal is only to obtain a single spatially and temporally 
constant set of parameters («i, . . . , a p ) for the entire globe. 
We take this very conservative approach to find a minimum 
reference level (benchmark) even though much better 
benchmark model performance is possible if tuned pa- 
rameters in Eq. (13) are allowed to vary in time and space. 

Unlike the nonlinear models such as LPRM or the 
Palmer model, this regression model combines available 
inputs (X’ li ~X pi ) in a linear way and produces a soil 
moisture proxy for each month without any knowledge 
of the physical relationships between the input and 


output variables. In addition, this statistical model is a 
fully diagnostic model and therefore ignorant of infor- 
mation concerning the prior status of the variables (like 
LPRM, but unlike the Palmer model). 

For the individual components of the data assimila- 
tion system, we define a series of benchmarking models 
as follows. 

Case 1: The purpose of Case 1 is to test the efficiency of 
LPRM soil moisture retrievals versus a benchmark 
model consisting of a linear combination of inputs into 
LPRM (Fig. 1). For Case 1, our focus is on comparing 
the 1-month lagged rank correlation of $lprm versus 
NDVI — denoted as R(— 1) — versus the analogous 
rank correlation of a benchmarking model based solely 
on a simple linear combination of T byH , T bv , and T s or 

BM(7' v , T b v , T bH ) i = «| i + a 12 T b jii 

+ a l,3 T by,i + B l 1 i- ( 14 ) 

Case 2: A defining feature of LPRM is that it produces 
t as an intermediate product. Since r reflects the 
presence of vegetation water content, it contains 
drought-relevant information. Based on this hypoth- 
esis, another way of testing the efficiency of the 
LPRM is to compare the information of both LPRM 
outputs (0lprm and r, after combining them into 
a single variable) against a benchmark model derived 
from LPRM inputs. Therefore, to represent complete 
skills from the two outputs, we use the same multiple 
linear regression model in Eq. (15) and compare it 
with the regression model from the inputs in Eq. (16): 

LR(0Lpjuyp 7 )i = fl 2,l®LPRM ,/ d" a 2,2 T i e 2 ,i 0'^) 
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Fig. 4. Spatial average of R(L) for the training (60°S-30°N) and testing (30°-60°N) areas for Cases 1-3. Subplots 
show relative R(L) difference (%) of retrieval (or model) outputs to benchmarks indicated by the superscript (*) in 
the legend {e.g., 1OO*[R(L)0 lprm - R(L)6bm]/R(L)9bm} ■ Note for LR(6bprm, t) in (a) relative R(L) differences are 
computed using the BM(T S , MPDI, T b y) defined in Eq. (16). 


BM(7' V . MPDI, T by ) i = a 221 T' si + a 222 MPDl' 

+ a 222 T bVi + s 22i . (16) 

By comparing (15) and (16), we can get a sense of 
the value of 0lprm and t outputs — combined via 
Eq. (15) — versus the simple combination of LPRM 
inputs in Eq. (16). In this case, the benchmark 
model in Eq. (16) uses MPDI as an input variable 
because r is a function of the MPDI. The MPDI 
removes the effect of soil temperature from the 
microwave emission signal (Owe et al. 2001). In 
addition, T bv is used rather than T bH because a 
preliminary analysis (not shown) demonstrated that 
T h v has a higher correlation with NDVI than T b H . 
It is also interesting to examine the marginal value 
of r retrievals, above and beyond 0lprm, for agri- 
cultural drought monitoring. We can achieve this 
objective by comparing R(— 1) of the predictions 


from Eq. (15) with the R( — 1) for only 0lprm- 
Case 3: The performance of a nonlinear hydrologic 
model is evaluated by comparing it with a benchmark 
model based on the linear combination of its inputs 
[i.e., daily maximum air temperature (7 m;(X ) and 
daily accumulated precipitation (P)\: 


BM(T 


s’ B)/ a 3J T’rnax.i + fl 3,2 ^ i + e 3 ,i • 


(17) 


Therefore, the key comparison in Case 3 is between 
R(— 1) for Palmer and/or API model predictions of 
root-zone soil moisture versus R(— 1) for the soil 
moisture proxy obtained via this benchmark model. 
Note that the prognostic structure of the API and 
Palmer model should provide a natural advantage 
over the purely diagnostic form of Eq. (17). 

Case 4: The EnKF is designed to optimally combine 
model predictions (0pm or 0 api) an d observations 
(0lprm) using information about their relative 
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uncertainties. As an alternative to the standard 
EnKF, one can construct a benchmarking model 
that does not have any prior knowledge of those 
uncertainties and instead applies (globally constant) 
weights to merge model-predicted soil moisture (0 PM 
or 0 API ) with remotely sensed soil moisture retrievals 
(0lprm): 

BM(0 pm , 0 LPRM )j = a 4 j0pM,i + fl 4,2^LPRM,i + s 4j 

(18) 

BM(0api, ^lprivi), = a 42J ® API,/ + fl 42,2^LPRM,( + e 42,i‘ 

(19) 

Note that the EnKF uses different uncertainties for 
each pixel (i.e., each pixel has its own model and 
observation error depending on land cover and 
distance from WMO rain gauges). In contrast, the 
benchmarking model applies spatially and tempo- 
rally constant weights for the Palmer model pre- 
dictions (0pm) and observations from LPRM (0lprm) 
obtained from a training dataset. Comparing 
R( — 1) for soil moisture proxy obtained via Eqs. (18) 
or (19) with R(— 1) for a soil moisture analysis 
(acquired using a Palmer or API model-based 
EnKF) will evaluate how efficiently the EnKF is 
implemented. 


4. Results 

Figure 3 shows spatially averaged lagged R(L) results 
for each component of an offline agricultural drought 
monitoring system (LPRM soil moisture and vegetation 
optical depth retrievals, open loop model predictions, 
and an EnKF analysis) for the training and testing da- 
taset separately during the 9-yr period from July 2002 to 
June 2011. Although LPRM soil moisture retrievals 
(0lprm in Fig. 3) reflect only surface soil moisture con- 
dition, it demonstrates nearly as large a R(— 1) as root- 
zone soil moisture estimates obtained from the EnKF 
analysis (0 P m-kf or 0 api-kf)- In contrast, the Palmer 
model-predicted root-zone soil moisture (0 PM ) has rel- 
atively low R(— 1) compared to the other soil moisture 
estimates. Most notably, it fails to match values obtained 
with the simpler API model. Error bars constructed 
using Eqs. (8)— (11) are added to each point in Fig. 3 to 
reflect lcr sampling uncertainty. Relatively larger error 
bars for API results are due mainly to its higher tem- 
poral autocorrelation (and thus reduce temporal degrees 
of freedom) relative to other products. 

For each case introduced above, Figs. 4 and 5 compare 
the R(L ) of each soil moisture product versus its 


appropriate benchmark in both the training and testing 
datasets. As shown in Crow et al. (2012b), semiarid areas 
have very strong correlation [R(— 1) > 0.5] between soil 
moisture and NDVI due to water-limited plant growth. 
Therefore, the large fraction of semiarid areas found in 
the Southern Hemisphere training dataset ensures that 
the average R(— 1) is higher in the training set than in the 
Northern Hemisphere testing set. An in-depth explanation 
of results in Figs. 3, 4, and 5, focusing mainly on the 
testing set, is given below. 

Spatially different coupling strengths between soil 
moisture and NDVI due to different climate conditions 
can be found on the maps of R(— 1) for the testing area 
(ETNH, 30°N - 60°N) in Fig. 6. In relation to climatic 
region, there are distinct differences in the magnitude of 
R(— 1) for different land cover types, as shown in Fig. 7. 
Land cover classification and their areal fractions are 
provided in Table 2 for both the training and testing 
areas. Error bars in Fig. 7 were computed using Eqs. (8)- 
(11). Note that two land cover types [evergreen broad- 
leaf forests (EBF) and deciduous needleleaf forests 
(DNF)] are not displayed in Fig. 7 because of their small 
portions (Table 2) and large sample variance. Most 
forested areas show weak relationships between soil 
moisture and NDVI [R(— 1) < 0.2], which makes sense 
because trees take water from deeper soil and are more 
resilient to surface soil moisture shortage than shrub or 
grass. Shrub and grasslands that correspond to the 
semiarid climate have strong correlations with NDVI 
[R(~ 1) > 0.4]. Croplands have relatively lower R(— 1) 
(—0.3) than the shrub or grasslands, likely because ar- 
tificial human interference for agricultural practices 
such as irrigation and the installation of tile drainage 
systems may disturb the direct coupling between soil 
moisture and NDVI in managed agricultural landscapes. 

Below, we define the efficiency of a given model 
monitor component (e.g., the soil moisture retrievals 
algorithm, the LSM, and the data assimilation system) as 
the difference between the R(— 1) of soil moisture out- 
put provided by each component and that is obtainable 
using its corresponding linear benchmark. This effi- 
ciency provides a means to evaluate the added value of 
nonlinear and/or complex processes embedded in each 
component relative to a baseline established by the 
simple benchmarks in Eqs. (14)-(19). Note that while 
each benchmark model is based on fitted parameters, 
these parameters are static in time and space and are 
fitted to a spatially distinct training dataset. 

a. Case 1: Efficiency of the LPRM retrieval algorithm 

For Case 1, comparisons between the benchmark 
model in Eq. (14), based on a linear combination of 
primary AMSR-E observations (T b v , T bH , and 7’ v ). and 
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Fig. 5. Spatial average of R(L) for the training (60°S-30°N) and testing (30°-60°N) areas for Case 4. Subplots show 
relative R(L) difference (%) of updated outputs to benchmarks indicated by the superscript (“*”) in the legend (e.g., 
lOO*[i?(L)0 PM -KF — R(L)Sbm]/R(L)Sbm)- 


the full LPRM soil moisture product (0lprm) are de- 
cidedly mixed. Averaged over the testing domain, 
the benchmark model produces roughly the same R(— 1) 
as the full LPRM soil moisture product (Table 3). For 
all lags other than L = —1, the benchmark model 
slightly outperforms the LPRM soil moisture product in 
predicting vegetation conditions (Fig. 4a-2). How- 
ever, when focusing just on R(— 1) results for individual 
land cover types within the testing area, LPRM out- 
performs the benchmark model for most land cover 
types other than open shrublands (OS), closed bushland 


or shrublands (CBS), deciduous forests (DBF) and bare 
soil in Fig. 7, and the Z-score map in Fig. 8a illustrates 
that LPRM is marginally superior to the benchmark 
model over broad areas of Europe and the eastern 
United States. 

The evaluation approach in the present study adopted 
a benchmark model as a minimum reference level to 
evaluate the performance of the LPRM retrieval algo- 
rithm. Even though LPRM soil moisture has been 
shown to correlate well with in situ observations of soil 
moisture (Draper et al. 2009; Brocca et al. 2011), results 



Fig. 6. Map of R(— 1) for LPRM soil moisture estimates within the testing area. 
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FlG. 7. Spatial average of R( — 1) of various soil moisture products within different land cover 
types in the ETNH (30°-60°N). See Table 2 for definitions of land cover acronyms. 


here imply that LPRM benefits marginally from its 
nonlinear parameterization when judged against our 
NDVI validation metric. As a result, it may be possible 
to simplify (and/or linearize) the LPRM algorithm 
without adversely affecting its value for drought fore- 
casting. Nevertheless, it should be stressed that in this 
study we benchmark the “quality” of soil moisture infor- 
mation only for a very specific application — forecasting 
the impact of agricultural drought on vegetation health — 
and these results might be application specific. 

b. Case 2: Efficiency of the LPRM retrieval algorithm 
and added skill of r 

Case 2 is designed to investigate if LPRM r estimates 
can add robust skill to agricultural drought monitoring. 
First, it should be noted the difference in measurements 
of NDVI and t. In principle, NDVI observation is 
strongly influenced by the chlorophyll concentration in 


vegetation and is thus related to green leaf biomass 
(Owe et al. 2001; Jones et al. 2011). However, r is ob- 
tained based on the vegetation dielectric properties that 
are strongly related to vegetation water content in both 
foliage and woody biomass (Liu et al. 2011; Andela et al. 
2013). Therefore, the NDVI/r relationship is not known 
explicitly, even though both of them reflect some aspects 
of vegetation condition and NDVI was used to validate 
the t in Owe et al. (2001). 

From Fig. 3, the t rank anomaly is significantly less 
successful (compared to surface soil moisture observations) 
as a leading indicator for future NDVI anomalies, es- 
pecially in testing areas. Interestingly, however, the t 
has strongest correlation with NDVI at L = 1 [i.e., 
Rank(rO temporally lags behind Rank(ND Y'L) by 1 
month]. Therefore, t information is more suitable for 
a retrospective analysis of NDVI rather than fore- 
casting. Nonetheless, the r shows relatively high R(— 1) 
with NDVI for several land cover types. For example, 


Table 2. Land cover classification in the training and testing datasets. Pixels where all five remote sensing retrievals (T h ,v, Tb n, T s , 
0 LPR M, and t) are available and monthly maximum air temperature is above 5°C are included in the present analysis. Thus, many forested 
or frozen areas are excluded. 


Land cover type 

Training area (60°S-30°N) 
No. of pixels 

Testing area (30°-i 
No. of pixels 

60°N) 

Evergreen needleleaf forests (ENF) 

6155 

0.75% 

70485 

10.88% 

Evergreen broadleaf forests (EBF) 

51983 

6.31% 

112 

0.02% 

Deciduous needleleaf forests (DNF) 

1 

0.00% 

2433 

0.38% 

Deciduous broadleaf forests (DBF) 

17 695 

2.15% 

27 926 

4.31% 

Mixed forests (MF) 

2640 

0.32% 

42 916 

6.63% 

Woodlands (WL) 

154142 

18.72% 

13 933 

2.15% 

Wooded grasslands/shrubs (WGS) 

161 786 

19.65% 

14795 

2.28% 

Closed bushlands or shrublands (CBS) 

88 576 

10.76% 

8328 

1.29% 

Open shrublands (OS) 

136474 

16.58% 

76824 

11.86% 

Grasses (GRS) 

51441 

6.25% 

192 866 

29.77% 

Croplands (CRP) 

77330 

9.39% 

160 329 

24.75% 

Bare (BAR) 

74993 

9.11% 

36834 

5.69% 
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Table 3. Optimized benchmark models and their performance. Key comparisons for each case are in bold. 


Optimized parameters R(— 1) 


Case 

Model/algorithms/regression equation 

a *,l 

a *.2 

a *,3 

Training 

Testing 

1 

^LPRM 

— 

— 

— 

0.388 

0.301 


a U T 'sj + a L2 T 'b,H,i + fl 1.3 T 'b, V,i 

0.5 

0.8 

-1.7 

0.420 

0.299 

2 

T 

— 

— 

— 

0.330 

0.166 


a 2,l e LPRMj + «2,2T' 

0.25 

0.12 

— 

0.412 

0.307 


«22,1 T' s . + a 2 2,2MPDI' + 022,3 T' b v i 

0.5 

-0.6 

-1.3 

0.417 

0.298 

3 

^PM 

— 

— 

— 

0.277 

0.266 


^API 

— 

— 

— 

0.329 

0.304 


“3,1 +03.27) 

-0.19 

0.52 

— 

0.241 

0.223 

4 

^PM-KF 

— 

— 

— 

0.380 

0.305 


^API-KF 

— 

— 

— 

0.390 

0.320 


04,1 ^PM,/ + a 4,20LPRM,i 

0.28 

0.89 

— 

0.395 

0.319 


042,1 ^API,/ + a 42,20LPRMj 

0.32 

0.57 

— 

0.411 

0.339 


land cover types such as CBS, OS, and grasses (GRS) 
have R(— 1) greater than 0.25, but for forested areas and 
even crop land, r shows little relationship with next 
month’s NDVI (Fig. 7). 

Because of low R(L ) for negative L, it is expected that 
the r does not add significant value to the prediction of 
future vegetation conditions. Therefore, when 0lprm 
and t are linearly combined into LR(0 L prm, t) through 
Eq. (15), it produces only marginally higher R(— 1) than 
the soil moisture retrievals only [0.307 for LR(0 lprm , t) 
and 0.301 for (?lprm only] for testing dataset as shown in 
Fig. 4a-2. However, because of the lagged nature of the t 
response versus NDVI, the addition of r does seem to 
increase R(L ) for L > 0. 


In the context of evaluating the efficiency of nonlinear 
LPRM retrieval algorithm, the linear combination of the 
two LPRM outputs (0 lprm and r) expressed in Eq. (15) 
is compared to the linear regression model of the three 
LPRM inputs ( T s , MPDI, and T b v ) in Eq. (16). The 
combination of 0 L prm and r barely outperforms the 
benchmark model in Eq. (16) for the testing dataset 
(0.307 versus 0.298 in Table 3). Since t is a function of 
MPDI, MPDI rather than T bH is used in Case 2 for 
creating the benchmarking model. However, R( L) of 
the MPDI does not correspond well to the R(L ) of the 
r (not shown), and the use of MPDI as a predictor of the 
benchmarking model does not make much difference 
compared to the performance of the benchmark model 



FIG. 8. Maps of Z score for (a) Case 1 [«(-1 ) 9lpem - «(-l)e BM1 ], (b) Case 2 [R( -l) L R,e LPRM , T) - 
(<0 Case 3 [«(-l) epM - *(-!),„], and (d) Case 4 - R(~ l) eBM J. 
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in Case 1 [Eq. (14)]. A plot of spatially averaged K(L) 
for the benchmark in Eq. (16) is very similar to that of 
the Case 1 benchmark in Eq. (14). Therefore, R(L) of the 
benchmark of Case 2 are not shown in Fig. 4a, but the 
relative difference of LR(0 lprm , t) was computed based 
on its benchmark in Eq. (16). 

When spatially averaged, r does not seem to add sig- 
nificant skill to NDVI prediction. However, for certain 
land cover types such as CBS and OS, the combination 
°f ^lprm and r contributes to an increase in R(— 1) from 
the soil moisture-only case because of the high R(— 1) of 
t (Fig. 7). This result might be explained by the differ- 
ence in what r and NDVI actually measure; NDVI is 
sensitive to the chlorophyll concentration in the canopy 
while t is sensitive to the water content both in foliage 
and woody biomass. Even though total above-ground 
biomass represented in r decreases with a lack of pre- 
cipitation, the NDVI may show lagged response because 
green canopy cover is maintained in grassland or 
shrubland land covers during drought (Liu et al. 2011). 
Therefore, the t information may be more useful for 
open and closed shrubland vegetation types. This is also 
confirmed by the Z-score map in Fig. 8b. When 0lprm 
and t are combined into LR(0 L prm, t) via Eq. (15), 
certain relatively arid regions (e.g., the western United 
States and northern Africa) are converted from negative 
to positive Z scores, indicating an improvement in per- 
formance relative to the benchmark (cf. Figs. 8a, b). 

c. Case 3: Efficiency of hydrologic models 

Case 3 examines the efficiency of a nonlinear hydro- 
logic model by benchmarking Palmer model output 
against a linear regression model — given in Eq. (17) — 
with independent variables matching the meteorological 
inputs of the Palmer model (precipitation and daily 
maximum air temperature). If more complicated LSMs 
are evaluated, additional model inputs can be added to 
Eq. (17). Since soil moisture retains memory from the 
previous model time step, any kind of prognostic model 
should possess a natural advantage over a fully di- 
agnostic model like the benchmark model in Eq. (17). 
Therefore, for more objective evaluation, performance 
of the Palmer model is also compared with a purely 
linear prognostic model (API model) as well as the 
benchmark in Eq. (17). The Palmer model produces soil 
moisture condition for each of the two layers. To mimic 
an integrated root-zone estimate, we use averages of the 
two soil moisture outputs weighted by the water holding 
capacity of each layer. 

The Palmer model performs relatively poorly in pre- 
dicting future vegetation dynamics as shown in Fig. 3. 
The values of R(— 1) of the Palmer model are 0.277 
and 0.266 for training and testing dataset, respectively 



Fig. 9. Relationship between correlation with NDVI and runoff 
generation using the API model with different settings of water 
holding capacity. The original water holding capacity (whe) used 
for the Palmer model is indicated as whc*l. Global average runoff 
estimate is based on Trenberth et al. (2007). 

(Table 3), and are significantly worse than the perfor- 
mance of the API model (0.329 and 0.304 for the training 
and testing dataset, respectively). However, the bench- 
mark model in Eq. (17) is even worse than the Palmer 
model [R(— 1) < 0.25]. As noted above, this is likely due 
to its diagnostic model and lack of memory regarding 
past soil moisture conditions. In Fig. 4b, the gap between 
the benchmark in Eq. (17) and the other prognostic mod- 
eling approaches reiterates the importance of representing 
month-to-month memory when deriving soil moisture 
proxies for agricultural drought monitoring. 

The Z-score map in Fig. 8c also shows the relatively 
poor performance of the Palmer model compared to the 
API model serving as a benchmark. Looking at R(— 1) 
for each land cover type in Fig. 7, soil moisture varia- 
tions in forested areas again demonstrate a weak cor- 
relation with NDVI, which is attributed to resilience of 
forest biomass to variations in soil moisture rather than 
weakness of a specific observations or model products. 
The root-zone soil moisture outputs from the Palmer 
model do not even catch up with the performance of 
LPRM surface soil moisture retrievals for most land 
cover types [wooded grasslands/shrubs (WGS), CBS, 
OS, GRS, and croplands (CRP)] in which agricultural 
drought monitoring is crucial (Fig. 7). This supports the 
earlier finding of Bolten and Crow (2012) that existing 
satellite-based soil moisture products provide at least as 
much global agricultural drought information as a sim- 
ple, offline water balance model driven by available 
global precipitation datasets. 

Based on comparisons with linear API results, the 
Palmer model does not appear to be fully utilizing the 
precipitation and air temperature information it requires 
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Fig. 10. An example spatial distribution of monthly averaged Kalman gain (K*.) from the 
Palmer-EnKF system (July 2003). 


as input. This implies that the nonlinear physics of the 
Palmer model are somehow squandering information 
present in its input. To pinpoint the nonlinearity re- 
sponsible for this effective loss of information, we con- 
strained the API model using different water holding 
capacity values as shown in Fig. 9. The original fully linear 
API model predicts no runoff but has a globally averaged 
R(— 1) of 0.294. Constraining the water holding capacity 
of the model to produce a realistic amount of global 
runoff [based on Trenberth et al. (2007)] leads to a sharp 
degradation in R{— 1) results. This suggests that the 
addition of a simple saturation threshold to the API 
model, required to generate surface runoff and any type 
of reasonable streamflow prediction, tends to decrease 
the utility of API predictions as a predictor of coarse- 
scale agricultural drought. As such, it provides a direct 
example of the so-called land surface model multi- 
objective parameterization problem (Yapo et al. 1998; 
Vrugt et al. 2003), whereby LSM parameterizations re- 
quired to minimize error in one type of model output 
(e.g., runoff predictions) are often at odds with the op- 
timal parameterization required for a second type of 
output (e.g., root-zone soil moisture monitoring for ag- 
ricultural drought). Consequently, for coarse-scale agri- 
cultural drought monitoring, it appears there is no utility in 
enforcing a nonlinear saturation limit on soil moisture 
dynamics. Since such a threshold is required to generate 
surface runoff, these results also call into question the 
practice of monitoring agricultural drought using pre- 
dictions from a land surface model calibrated using 
streamflow data. 

d. Case 4: Efficiency of the EnKF 

The two benchmarks in Eqs. (18) and (19) linearly 
combine anomalies of the model-predicted root-zone 
soil moisture (from a prognostic model) and observed 
surface soil moisture. Because of the relatively poor 
performance of the Palmer model in Case 3 and rela- 
tively strong correlation of LPRM soil moisture re- 
trievals with NDVI, the optimized regression coefficient 
for LPRM retrievals is 3 times higher than the coefficient 
for Palmer model predictions (0.89 versus 0.28 in Table 
3). The better performance of the API approach relative 
to the Palmer model in Fig. 3 causes relatively more 


weight to be placed on the model, but LPRM soil mois- 
ture retrievals are still given almost 2 times more weight 
than API model predictions (Table 3). 

The results of Case 4 in Table 3 and Figs. 5 and 8d 
show that the benchmarks outperform the outputs from 
the EnKF for both the Palmer and the API model. In 
other words, this benchmarking evaluation indicates an 
inefficient implementation of the EnKF systems for both 
the Palmer and API model. In addition, when the EnKF 
results of the Palmer model are compared with the 
performance of the LPRM soil moisture only (Fig. 3), 
we can see that most of the predictive skill of the EnKF 
analysis is already present in the assimilated observa- 
tions; R( — 1) = 0.301 for Olprm while R(— 1) = 0.305 for 
0PM-KF in Table 3. This suggests that the background 
model is contributing relatively little information to the 
analysis. For several land cover types such as WGS, 
CBS, GRS, and CRP, where soil moisture observations 
have relatively stronger correlations with NDVI, the 
benchmarks outperform the EnKF in Fig. 7. 

In theory, the EnKF should assign optimal weights to 
background model predictions and observations and 
therefore outperform simple weighting based on spa- 
tially fixed parameters [Eqs. (18), (19)]. However, in 
practice, it is very difficult to specify those weights op- 
timally for each grid because of the complicated nature 
of error sources in model and observed soil moisture 
products (Crow and Van den Berg 2010). Here, R and Q 
are assigned in a relatively simple way based on land 
cover and distance from WMO rain gauges, respectively, 
following Bolten et al. (2010) and Bolten and Crow 
(2012). However, comparisons between the Kalman 
gain map (based on specified R and Q) and optimized 
weights determined by the benchmark models suggests 
that R and Q have not been optimally specified. In 
particular, a spatially distributed Kalman gain map for 
the surface soil moisture in Fig. 10 suggests that the 
EnKF places excessive weight on the model background 
in the United States and Europe. 

An additional assumption applied in this study is 
that of perfect vertical error correlation [i.e., p = 1 in 
Eq. (4)]. However, this assumption may not be always 
true for certain soil conditions. In that case, the Kalman 
gain in Eqs. (6) and (7) may link forecasted surface 
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observations with the root-zone soil moisture prediction 
inappropriately, which may also contribute to the appar- 
ently suboptimal performance of the Bolten and Crow 
(2012) EnKF system. 

5. Discussion and summary 

Over the past decade, soil moisture data assimilation 
has been actively investigated in the land surface mod- 
eling community as a tool for improving operational 
drought monitoring. However, the various soil moisture 
retrieval algorithms, land surface models, and data as- 
similation approaches comprising such a system have 
generally been evaluated separately. Therefore, their 
relative benefits within a comprehensive data assimila- 
tion system have never been fully assessed. In addition, 
traditional validation approaches based on direct com- 
parison against ground-based observations have limited 
the evaluation of these systems to isolated, data-rich 
areas where sufficient ground-based observational re- 
sources are available. 

Here, all three major components of a soil moisture 
data assimilation system (retrieval algorithm, hydro- 
logic model, and an assimilation technique) for agri- 
cultural drought monitoring are evaluated against a 
series of benchmarks derived from linear statistical 
models. This benchmarking approach is designed to 
assess the value of complex and/or nonlinear processes 
directly by comparing them with the performance of 
simple linear models utilizing the same set of input 
variables. In addition, this study uses a novel evaluation 
metric, rank correlation with NDVI from Crow et al. 
(2012b), to quantify the predictive skill of various soil 
moisture proxies over broad continental-scale regions. 

First, the efficiency of LPRM, a soil moisture retrieval 
algorithm, was evaluated in Case 1. The LPRM soil 
moisture product marginally outperformed a benchmark 
model constructed directly from AMSR-E radiance 
measurements. It should be stressed that the evaluation 
result of this study does not directly evaluate the in- 
trinsic accuracy of the LPRM soil moisture products, but 
rather how much skill they posses for a single, specific 
application (the forecasting of future vegetation condi- 
tions). Nevertheless, in this specific context, the non- 
linear LPRM retrieval algorithm does not appear to add 
much additional predictive information compared to a 
corresponding linear benchmark constructed using LPRM 
inputs. 

In Case 2, vegetation optical depth (t), the additional 
AMSR-E retrieval product of LPRM, shows a potential 
for the retrospective analysis (as opposed to forecasting) 
of vegetation condition since it demonstrates the high- 
est correlation coefficient with NDVI at L = 1 [i.e., 


RankfLt) temporally lags behind Rank(NDY'L) by 1 
month]. The lagged response of r to NDVI is consistent 
with the observations of Jones et al. (2011) made with 
regards to plant phenology. However, the relationship 
between r and NDVI is not fixed across different land 
cover types (Liu et al. 2011). For L = —1, however, 
neither t nor MPDI add any information over and above 
the benchmarks defined in Eqs. (14) or (16). 

In Case 3, the modified two-layer Palmer model is 
evaluated by comparing its performance to the linear, 
prognostic API modeling approach illustrated in Eqs. 
(1) and (2) as well as a diagnostic benchmarking model 
in Eq. (17) based on the same meteorological forcing 
input utilized in the Palmer and API models. The 
Palmer model outperforms the diagnostic benchmark 
model in Eq. (17), but only because of its prognostic 
nature. More importantly, the poor performance of the 
Palmer model compared to the API implies the in- 
efficiency of the nonlinear physical characteristics in the 
Palmer model. Crow et al. (2012b) also used a similar 
API model as a baseline to evaluate modern LSMs 
based on the same evaluation criteria for agricultural 
drought monitoring and showed that those complex 
LSMs do not produce significantly higher R(L) than 
the API model. Based on the superior performance of 
the API model, we hypothesize that the finite satu- 
ration threshold of soil layers in the Palmer model 
may actually hinder its utilization of meteorological 
input information for agricultural drought monitor- 
ing. However, considering the relative simplicity of 
the models considered here, further research using 
more complex LSMs is required to better understand 
this issue. 

Although Bolten and Crow (2012) showed the added 
benefit of assimilating surface soil moisture observations 
into the Palmer model, the evaluation results of the 
present study suggest that the benefit of soil moisture data 
assimilation comes mainly from the strength of the ob- 
servations, not from the efficiency of a sophisticated as- 
similation technique such as the EnKF. That is, it is shown 
that the EnKF outputs from the Palmer model produce 
lower R(— 1) with NDVI than the benchmark model in 
Eq. (18) that simply combines model forecasts and ob- 
servations using globally fixed weighting parameters. The 
relatively inefficient performance of the EnKF is likely 
linked to inappropriate model and observation error as- 
sumptions underlying the estimation of the Kalman gain. 
This suggests that additional work is still required to 
optimally parameterize large-scale land data assimilation 
systems for agricultural drought monitoring. 

In spite of significant advances in developing soil 
moisture data assimilation systems, there are still 
limitations in evaluating them objectively for further 
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improvements. This study suggests a simple but ef- 
fective evaluation approach using statistical bench- 
mark models and successfully pinpoints weaknesses 
in each component of a soil moisture data assimila- 
tion system. 
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